Effects of Distance between Classes and Training Datasets Size to the Performance of XCS: Case of Imbalance Datasets
نویسندگان
چکیده
This paper analyzes the effects of distance between classes and training datasets size to XCS classifier system on imbalanced datasets. Our purpose is to answer the question whether the loss of performance incurred by the classifier faced with class imbalance problems stems from the class imbalance per se or it can be explained in some other ways. The experiments from 250 artificial imbalanced datasets show that XCS can perform well in some imbalance domains if the training datasets size is large enough and the distance between classes is appropriate. Thus, it dose not seem fair to correlate imbalance datasets directly to the loss performance of XCS. Through this research, we also know what kinds of datasets are suitable for training XCS and dealing with class imbalances alone will not always help improve performance of classifiers.
منابع مشابه
Matching of Polygon Objects by Optimizing Geometric Criteria
Despite the semantic criteria, geometric criteria have different performances on polygon feature matching in different vector datasets. By using these criteria for measuring the similarity of two polygons in all matchings, the same results would not have been obtained. To achieve the best matching results, the determination of optimal geometric criteria for each dataset is considered necessary....
متن کاملRole of Heuristic Methods with variable Lengths In ANFIS Networks Optimum Design and Training
ANFIS systems have been much considered due to their acceptable performance in terms of creation of fuzzy classifier and training. One main challenge in designing an ANFIS system is to achieve an efficient method with high accuracy and appropriate interpreting capability. Undoubtedly, type and location of membership functions and the way an ANFIS network is trained are of considerable effect on...
متن کاملارائه یک روش فازی-تکاملی برای تشخیص خطاهای نرمافزار
Software defects detection is one of the most important challenges of software development and it is the most prohibitive process in software development. The early detection of fault-prone modules helps software project managers to allocate the limited cost, time, and effort of developers for testing the defect-prone modules more intensively. In this paper, according to the importance of soft...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملINDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007